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I. Introduction 



Recently, the tools of statistical mechanics have been extensively applied to the study of 
collective properties of neural networks JT|; spin glass theory has played an important role 
in the growth of this new field @. In particular, the optimal (error-free) storage capacity 
for recurrent networks can be obtained by calculating the typical fractional volume of the 
space of interactions ({Jij}) satisfying the condition that for a given set of patterns each 
pattern is a fixed point of the deterministic (zero-temperature) dynamics 

S i (t+l) = SgDQ2J ij S j (t)], (1) 
3 

where Si(t) (= ±1) {i = 1,...,N) represents the state of the ith neuron at time t, and 
the synaptic coupling Jjj determines the contribution of a signal fired by the jth neuron 
to the action potential on the ith neuron. This approach of systematic exploration of 
the space of interactions, which was pioneered by Gardner |§ and reformulated in terms 
of canonical ensemble calculation |4]], has been applied in various directions PHTTf. The 
Hopfield model with general continuous couplings has been found to be capable of storing 
at most two uncorrelated random patterns per neuron without errors and larger number 
of patterns for biased patterns 0. The network with discrete (Ising-type) couplings has 
been also extensively investigated since the replica-symmetry theory was reported to yield 
wrong results for the optimal storage capacity || @, [7|. The method is not limited to 
Hopfield-type neural networks but applicable to multilayer networks as well as to simple 
perceptrons |5], [ll]]. However, Gardner's method is based on the concept of fixed points 
of the dynamics, and obviously does not work if the dynamics is stochastic (i.e., at finite 
temperatures). In addition, it requires perfect matching so that each pattern is inerrably 
recalled at every site whereas in practice one usually considers a neural network to be 
remembering or recalling if the overlap between the network state and one of the patterns 
is larger than some given value. 

In this paper, we propose a scheme to define the optimal storage capacity at finite 
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temperatures and study its temperature dependence. We introduce the tolerance parameter 
m(< 1) in such a way that the m — > 1 limit corresponds to the perfect recall while (1— m)/2 
measures the error allowed. We then calculate the typical fractional volume of the space of 
interactions for extremely-diluted networks as a function of the storage ratio a, temperature 
T, and the tolerance parameter m, which leads to the optimal storage capacity a c as a 
function of T and m. At zero temperature it is found that a c = 2 regardless of the tolerance 
parameter m. At finite temperatures, on the other hand, the optimal storage capacity 
vanishes in the perfect matching limit (m — > 1) and in general increases with the tolerance. 
We then discuss how the best performance is obtained for given a and T. We also propose 
an alternative criterion for recalling, which may be regarded as a simple approximate 
scheme to define the optimal storage capacity, and consider the optimal storage capacity of 
dynamic model as well as of the extremely-diluted model in this approximate scheme. 

The contents of this paper are as follows: In Sec. II, we propose a scheme to define 
the optimal storage capacity at finite temperatures together with an approximate scheme. 
Section III is devoted to the calculation of the optimal storage capacity of an extremely- 
diluted neural network while Sec. IV presents results of the approximate scheme. A brief 
discussion is given in Sec. V. 



II. Optimal storage capacity at finite temperatures 

One usually takes into account internal noise in the functioning of a neuron by extending 
the deterministic evolution rule (|l|) to a stochastic one: 

P[ Si (t + 1) = ±1] = §{1 ± tanh[/3 ]T J ijSj (jt)]} , (2) 

j 

where the inverse temperature ((3 = 1/T) measures the width of the threshold region, 
i.e., the level of synaptic noise. The state s = {s^ of the network of N neurons evolves 
stochastically according to Eq. (0). A given set of states of the network to be memorized 
by adjusting appropriately the couplings is called the set of patterns. We now define the 
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overlap M^(t) between the network state and the /ith pattern £ M = {£f } (// = 1, . . . ,p) by 

1 N 

which also evolves stochastically along with s(t). When a network is recalling pattern /j,, the 
time average of M M (t) over the time scale sufficiently longer than the observational time but 
shorter than the life time of the local energy minimum should be close to unity. However, 
since the dynamics is stochastic, it cannot be strictly unity as in the zero-temperature 
dynamics. Therefore, we introduce the tolerance parameter m in such a way that the 
network is considered to be remembering the /ith pattern if 

I Nt 

iV * t=l 

with N t in the appropriate range as mentioned above. The quantity (1 — m)/2 is the 
maximum error allowed for the network to be qualified as functioning. It is expected that 
the time average M M is equivalent to the restricted thermal average 

1 N 

ly i=l 

where the thermal measure is restricted within a single pure state (containing the configu- 
ration £ M ) [I3|. In the stationary state the activity (s*) of the ith neuron is determined by 



the coupled equations 

(Si) = (tanh(/3 JijSj)) = tanh(/3 ^ Jij(sj)) , 
j i 

where a mean-field approximation has been used. Such an approximation is expected to 
be valid for diluted networks which we mainly consider in this work. Otherwise, a reaction 
term may be necessary. The optimal storage capacity is given by the upper bound of the 
storage ratio a = p/N, where p is the number of stored patterns. The problem reduces, 
according to Gardner |J, to the calculation of the typical fractional volume of the space 
of interactions which satisfies the following conditions: 

^E£f tanh (^=E J ij( s j)) >m ' ( 3 ) 



and 

J2(Jij) 2 = N for each i. (4) 

3 

The condition in Eq. (|j) is required to fix the scale of temperature T. The optimal storage 
capacity at temperature T is then determined by vanishing of this fractional volume, which 
leads to a c as a function of T and m. Application of the above scheme to an extremely- 
diluted neural network will be given in the following section. In this model, only an 
extremely small fraction of couplings among neurons are connected so that its dynamics 



can be solved in a rather simple manner [14], ^5j. However, calculation of the fractional 



volume for other generic neural network is formidable due to the thermal average to be 
performed within one pure state. As a simple attempt, one may use the approximation 
(sj) ~ £f and replace Eq. (||) by 

i-E^tanhf^fE^) >m, (5) 



which states that the network state evolved by one time step from a given pattern has 
overlap with that pattern greater than m. This simple criterion presumably leads to results 
similar to those of Eq. (|3]) for m close to unity, where the network is expected to hover 
around the configuration £ M during the recalling state. The validity of this approximate 
scheme with regard to diluted networks is discussed in Sec. IV. 

III. Extremely-diluted neural network 

In this section the proposed scheme is applied to an extremely-diluted neural network, 
where, on the average, there are C (^SlogiV) connections per neuron. Such a model was 
first studied by Derrida et al. WM, and later its properties of the basin of attraction was 
studied by Gardner [PJ and by Amit et al. The reason why we can implement 



the scheme exactly is that the dynamics of the network can be solved in a rather simple 



manner flTJL [0|. Extending the method of Keppler and Abbott ||17||, we describe the time 
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evolution of the overlap M^{t) between pattern ^ and the network state by the one-step 
recursion relation 

M,(t+l)=F h ,[M,(t)]. (6) 
Here the map F^(x) is defined by 

1 N . 

F hl *(x) = j~ Y, Dz tanh[/3( v / l - x 2 z + h$x)\ , 

™ i=l J 

where (3 (= T _1 ) is the inverse temperature, Dz denotes the Gaussian measure: Dz = 
exp(— z 2 /2) dz/\ / ^7i, and the subscript denotes the {/if }-dependence of the map with 

W = &E-j=z!- ( 7 ) 

In the stationary state, M^(t) approaches M^{t — > oo) = M* of which value is determined 
by the stable fixed point solution x satisfying 

x = F^(x) (8a) 
|J^M*)| = \FUx)\ <1, (8b) 

where Eq. fl8"rj|) has been imposed to guarantee its stability. For given tolerance param- 
eter m, the network is considered to be remembering the /zth pattern if the value of the 
stationary overlap M* is greater than m. 

Now the main quantity to calculate is the fractional volume of the space of interactions 
({Jij}) for which every pattern can be remembered. The normalization condition is now 
given by 

3 

instead of Eq. (|4]). The number of the solutions of Eqs. (8) with its value greater than m 
is formally given by 

A/"hM = / dM 8{M — Fhjt(M)) |1 — F^(M) \ 9(1 — \F^(M)\) (9) 

J m 
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so that the fractional volume can be written as 



V = 



I 



[n^]n^(E(^) 2 -c)n^(^) 



i+3 i 3 

where 9(x) is the step function and the number of stored patterns has been scaled according 
to p = aC. Here we are mainly interested in obtaining the optimal storage capacity a c . If 
the number of stored patterns exceeds a c C, there is no typical network ({J^}) that yields 
the value of the stationary overlap greater than m for all patterns. In the limit a — > a c , 
the number AhM of stable solutions approaches zero, and we may replace the step function 
6(Afhf) by A/hM. Furthermore the fractional volume vanishes only if A/i^ = for some /i, 
which implies that the replacement 9{x) — > x would not affect the optimal storage capacity. 
The fractional volume to calculate is now given by 



/[n^dn^(E(^) 2 -c)n^ 

• ^7 ^s— ■ do) 



Replacement of ^(x) by x in general is also justified in the following sense. We may assume 
that, for typical A/hM possesses finite, system size independent upper bound almost 

everywhere in the interaction space. This is reasonable since the map F^(x) is an average 
of N functions of the form 



J Dz tanh[/3(v / l - x 2 z + hx)] . 



This function is smooth and monotonic in x with derivative having the same sign as h. 
Therefore the average over many possible h is a sum of two parts: the monotonically 
increasing part from contributions of h > and the monotonically decreasing part from 
h < 0. So in practice there are only a few solutions at most. If we denote the upper bound 
by Ao, we then have the identity 

#(A/hM) < AhM < A/o 9(Nhr) ■ 



Integrating this over the interaction space, we immediately see that 

V <V <AT aC V . 

However, the fractional volumes are of the order of exp(— CN) so that (log V)/CiV is the 
same as (logVo) /CN in the thermodynamic limit iV — > oo. 

In this work, we consider the case that every pattern £f to be stored is an independently 
distributed random variable, taking the value ±1 with equal probabilities. The typical 
fractional volume V = exp(((log V))) for the random patterns involves averaging of logV^ 
over the distribution of the random patterns which may be obtained through the 

use of the well-known replica trick. To facilitate the averaging over the distribution of the 
random patterns, we introduce (^-functions describing Eq. (|7|) with the help of the conjugate 
variable raised to the exponential form 

w - & e ^jM?) = J - & e -j=m ■ 

The average over the random patterns for the replicated volume ((V n )) affects the expo- 
nential factor containing £f in the above expression, and leads to the following: 

H ^ a/3i j ai /3j ^ ' 

where a and j3 are the replica indices and, for an extremely-diluted network, the cumulant 
expansion has been cut-off at the second order || . Following Ref . || , we assume the second 
term to be independent of site i, so that 



C A^VV 3 C 

A? * A? 



Introducing the local order parameters 



n* = — V T a I 13 

"la/3 — q Z_j 

r { = — V T a T 13 

'a/3 — q Z_j 



together with their respective conjugate variables Q l aj3 and R l a g, we obtain 



((V n )) = 



1 



n 



dEf 



eMnCN/2) J % Am J 2ni/C J J£ Ani/C 



where 



cri a</3i af3i 

with Gi and G 2 given by 



exp(G 2 ) = 



dhfdhf 



aij 



a<0 



a(3ij 



n 



2tt 



a/3ij at 



a<j3i 



In the thermodynamic limit (N — > oo), C also approaches infinity, albeit slowly, and {{V n }} 
can be computed through the use of the steepest-descent method. In order to find the 
saddle point, we assume the replica- and site-symmetric ansatz 



E? = E 



QafS = Q ?a/J = 9 



R l aB - R 



a/3 



(a ?/3) 



With this ansatz, the function G in the limit n — > takes the form 



G = nN[\(E -qQ + sS - rR) + 9l + ag 2 ] , 



where 



9\ = —a 



, / Q + R 

l ^E-Q + S-R ' E-Q-S + R 



+ ( l H . — - + l"g(/-: -Q + S-R) 



92 = 



+ \og(E -Q-S + R) , 

dto 



1 r N 

nIU 1 * 1o s 



i=l 



exp(-iiVtg) / n^^H- 



i=i 
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In the above expression Ah is given by Eq. (^) with h M replaced by H = {Hi}, where 
Hi = y/\ — q hi — y/s — rt — yfqU. Since the saddle-point equations for the variables 
E, S, Q, and R are algebraic, we can eliminate these variables and finally write the 
typical fractional volume in the form 



V = exp[CN 



I Ml -q) + \ log(l - x 2 ) + | _ q){1 _ x2) + <*& 

where x = (s — r)/(l — q). 

To manipulate g 2 , we introduce the variable 

— d 

M = — — Fh_(M) 

together with its conjugate variable A and use the integral representation of 5-function. 
Noting the range of the variable M, we obtain 



92 = N 



- UDU log/ -^L= dM -^nr m \ -^(1-M) 



i=l 



x e xp[-AT(±^ + AM + A M)] J] ) ^ exp[A/(if i) M) + AM)] d M f(H h M)\ 

i=i 

where the functions f(H, M) and dMf{H, M) are given by 

f(H, M) = J Dz tanh[/3(v / l - M 2 z + M#)] 
d M f(H,M) = JLf(H,M). 

Now the integration over M is easily performed, and in the thermodynamic limit, the 
steep est- descent method yields 

g 2 = m max i max^mf- ijjj _XM+\J\+J Dt log ^ L>/i exp[A/(F, M) + Xd M f(H, M)]j , 

(11) 

where = \/l — g/i — y^(l — q)xto — y/qt. In Eq. (p]) , we should take the minimum over 
A and A rather than the maximum because the integration over A and A runs along the 
imaginary axis in the complex plane, which should be deformed to pass the saddle point. 
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When the saddle point happens to lie on the real axis, one may conveniently sweep along 
the real axis and the saddle point corresponds to the minimum point along the real axis. 

Since V depends on s only through x, it is straightforward to show that V reaches its 
maximum at x = and it follows that we can set to = in Eq. (|TT|). Since q represents 
the typical correlations of the solution of Eqs. (8), the typical fractional volume should 
shrinks to zero as q approaches unity. Accordingly, the optimal storage capacity is then 
determined in this limit. When q approaches unity the last term in Eq. (|TT1) diverges as 
~ (1 — q)~ l , and we write 



t , 



J Dt log J Dh exp[\f(H, M) + A d M f(H, M)\ — > ^— J Dt n t {H, 

where the function VL t (H) is then given by 

Skill) = ~\{H + t) 2 + A(l - q)f(H, M) + A(l - q) d M f{H, M) 

and H t is the value of H leading to the maximum of Q t - Therefore #2 also exhibits (1 — q)^ 1 
divergence: 

* = ..Sl, '^(- A(1 " q)M + 15(1 " 9)1 + / Dt f! ' (tf ' ) ) • 

and the saddle-point equation over A reads 

M = jDtf(H t ,M), (12) 

where the dependence on the variables A and A is implicit through H t (\, A, M; T). For the 
minimization over A, one should consider two cases. The first case is that the minimum 
occurs at A = 0. This happens when the absolute value of / DtdMf(H t , M) with A = 



and A = A , where A is given by the solution of Eq. (fL2"D with A = 0, is less than unity. 
In the other case, the minimum occurs at A ^ and the saddle point in the (A, A) plane is 
given by the solution of Eq. ( |T2"D together with the equation 



-sign(A) = J Dtd M f(H t ,M). 
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In both cases, we denote the saddle point to be (Ao, Ao), and write g% in the form 

2(1 - q) m<M<\ v ' ; ' 

where 

a Q \M;T)= J Dt[t + H t (X ,X ,M;T)} 2 . (13) 
Combining the above, we finally obtain the typical fractional volume: 

F=exp (^) [1 - a ™^i a ° 1(M;T)l 



which vanishes for 



a > a c = max a (M; T) . (14) 

m<M<l 



Interestingly, ao(M;T) also represents the maximum storage capacity for the stationary 
value of the overlap M* being in the range M < M* < M + SM. Due to the mean-field 
nature of the network, the optimal storage capacity is given by the maximum value of «o 
for the given range of M. 

Since ao(M; T) defined in Eq. ( |l~3|) involves minimization with respect to two variables 
(A, A) in addition to the two Gaussian integrals over t and z (representing thermal average), 
we computed them numerically. Figure [I] shows typical behavior of ao(M;T) for several 
values of T, with detailed behavior for T = 0.3 and 0.7 displayed in the inset. There 
exist three types of M-dependence of ao(M;T) according to T. When T is higher than 
Ti(~ 0.566), the maximum capacity «o decreases monotonically with M. For T < T±, «o 
exhibits a local minimum as well as a local maximum (as shown in the inset of Fig. [I]). 
This local maximum (at nonzero M) is in fact the global maximum of «o for T lower than 
T 2 (~ 0.414) whereas «o reaches its maximum at M = for T 2 < T < T\. 

In contrast to the naive expectation, the maximum storage capacity ao is n °t monotonic 
with M (or with the error allowed) when the temperature is lower than 7\. It is of interest 
to note that there are two kinds of fluctuations in the dynamics: One is thermal fluctuations 
associated with the synaptic noise and controlled by the temperature T while the other 
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is dynamical fluctuations described by yl — M 2 . The latter fluctuations come from the 
distribution of states with definite overlap M and may be considered to be driven by the 
dynamics itself. In general thermal fluctuations randomize spin orientations and tend to 
decrease the capacity whereas dynamical fluctuations affect the capacity in a more or less 
subtle manner because the level of dynamical fluctuations depends on the overlap. At 
zero temperature (T = 0) and for M = 1 neither thermal fluctuations nor dynamical 
fluctuations are present. In this limit, perfect matching is allowed, leading to «o = 2 
similarly to Gardner's result ||. However, small departure from M — 1 induces dynamical 
fluctuations of the potential of the neurons, so that the maximum storage capacity a 
decreases rapidly as M is reduced. At T = 0, as shown in Fig. II], a reaches its maximum 
at M — 1 and hence we have the optimal storage capacity a c = 2 regardless of the 
tolerance parameter m. At finite temperatures there always exist thermal fluctuations, 
which prohibit perfect matching. In this case it may be expected that allowing some error 
(i.e., m < 1) increases the capacity. On the other hand, reducing the overlap introduces 
dynamical fluctuations and eventually reduces the capacity if the temperature is not too 
high (T < Ti). Near M « 0, reduction of the overlap in general increases the capacity 
at any temperature since dynamical fluctuations favor small values of the overlap. Here 
we stress that one should not expect the divergence of the capacity in the limit m —>■ 
because the trivial solution M* = is not included. 

From the curves of a® it is straightforward to get the optimal storage capacity a c 
defined by Eq. (|TJ|) for given tolerance parameter and temperature. Typical behavior of 
function of m is shown in Fig. [2] at several temperatures. At temperatures higher 
than T\ (~ 0.566), «o is a monotonic decreasing function of m. Consequently, we have 
a c (m; T) = a (M = m; T), and the curves of a c are identical to those of «o- F° r T < T\, 
there appears a plateau on which the optimal storage capacity is constant over some range 
of the tolerance parameter. (See Fig. 0. The boundary of this region is displayed by the 
dotted line.) For T < T 2 (~ 0.414), it is interesting to note that cto reaches its maximum 
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near M « 1 and that a large value of the overlap is mostly favored. 

Consider a problem that we want to store and recall aC random patterns in the net- 
work at temperature T at the best performance, that is, we want the stationary overlap 
as large as possible. When a is small, one can easily find a set of couplings ({<%}) that 
yields the stationary value of the overlap near unity. As a increases, it becomes more 
difficult to find such a set of couplings. In general the quality of performance will deteri- 
orate with the storage ratio a. Since ao(M; T) is the maximum storage capacity with the 
stationary overlap M at temperature T, the best performance M p (a;T) for given storage 
ratio a and temperature T is determined by the largest value of M for which «o(M; T) is 
greater than a. This implies a = a c (M p ; T) and the curve of the best performance also 
corresponds to the optimal storage capacity. Therefore Fig. ||] also represents curves of the 
best performance with the abscissa and the ordinate denoting M p and a, respectively. As 
the number of stored patterns increases, there occurs a first-order transition from good 
performance to poor performance at temperatures not too high (T < Ti). Interestingly, 
at low temperatures (T < T2) the network near saturation naturally favors high-quality 
performance; there are no networks yielding low-quality performance. 



IV. Analysis using approximate criterion 

In this section, we study the proposed scheme with the approximate criterion given 
by Eq. (|5]) instead of Eq. (|3]) because it is very difficult to solve the dynamics of neural 
networks in general. In fact the calculation even with the approximation is not easy and 
we implement the calculation only for extremely-diluted neural networks. The validity of 
the approximation will be tested against the result of Sec. III. The fractional volume V of 
the space of interactions ({Jij}) satisfying Eqs. (f|) and (§) is given by 

J Utj M =l V7V i=l VC j J i \ • J 
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Although there is no restriction on the correlations between Jjj and Jjj, different sites z 
and j are not decoupled because of Eq. (H); thereby it is not easy to calculate the typical 
fractional volume V = exp(((log V))), which involves the average (( )) of logV over the 
distribution of the random patterns Nevertheless the calculation can be performed 

for an extremely-diluted network as discussed in the previous section. In this case the 
cumulant expansion can be cut-off as before at the second order. 

Following procedure similar to that in Sec. Ill, we obtain the typical fractional volume 
in the form 



V = exp(CJV 



1 q — rx 



±log(l-g) + ±log(l-a; 2 



+ OLQ2 



(15) 



where the function g 2 in this case is given by 

Uli log / 

i=l 



92 



N J-:,_ 



/oo J " poo w+„ 

11^ log / - 7 f^exp(-!iV^ 
-oo r_ , J-oo . 0^-1 AT 



V /Viv 



2 1V °Q) 



x [f[Dhi e(— ^tanh[/3(<yi - qh { - y/s-rt - y/qU)] -m 
J i=i ^ v i=i y 

with the same notations as in Sec. III. Using the integral representation of the ^-function 

f 1 N \ 

exp(^-iVA(M - — ^tanh/^J 



1 A , . , r 1 IW /-H 00 dA 1 N 

dM 

too 2m N 



6( — tanh hi — m 

2=1 



and noting the range of the integral variable M, we obtain 
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-m\- \tl + J Dt \og(^J Dh exp{Atanh[/?(^/l -qh - y/ (1 - q)x t - y/q t)}} 



where t and A are to be determined by the saddle-point equations. Since V depends on 
s only through x, it is straightforward to show that V reaches its maximum at x = and 
*o = 0. 

The optimal storage capacity can be determined according to the condition that the 
typical fractional volume shrinks to zero, which happens as q approaches unity. In this 
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limit, the typical fractional volume given by Eq. QT5| ) has the leading term: 



V = exp [CN 



i i og (l - q ) - am\ + f Dt Q t {H t ) 

1 — Q J 



l-q 

where VLt(H) = —\{H + t) 2 + A(l — q) tanh(PH) in this case and H t is again the value of H 
leading to the maximum of Q t . Note that H t depends on A(l — q) and T in addition to t. 
Thus, in the limit q — > 1 and A — > oo with A(l — g) fixed, the saddle-point equations read 

a c = (jDt{t + H t [\(l-qy,T\} ) 1 (16a) 
m = J Dt taah(pH t [\(l - q);T\\ , (16b) 

where if t is, by definition, given by the solution of the equation 

Q' t (H) = -{H + t) + A(l - q)/3[l - tanh 2 (/?#)] = . (17) 

Equation ( |ITD has a unique root unless A(l - q) > (3^3/4)1^ and t_ < t < t + . [In this 
range Eq. (|17D has three roots.] Here t± are defined to be 

t ± = n' t=0 [H = -Ttanh- 1 {^cos(^)}] 

with 

_ x ( 3V3T 2 



cos 



UA(l-g)y 

At zero temperature it is straightforward to compute H t and to write the optimal storage 
capacity in the form 

r-v^erf" (m) 
a r = ( / Dt t 2 





while at finite temperatures Eqs. (16) can be solved numerically. 

Figure |3| displays the optimal storage capacity a c as a function of the tolerance parame- 
ter m at various temperatures. The overall dependence of a c on m is qualitatively different 
from that obtained in Sec. III. In particular a c diverges as m — > 0. Our approximate 
criterion loses its validity near m = as it should be. However, m-dependence of a c near 
m = 1 is not too disparate from that of Fig. |2| at finite temperatures. The a c curves at 
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various temperatures shown in Fig. |3| are reproduced to expose the detailed behavior near 
m = 1 in Fig. || which, for comparison, also displays the corresponding curves obtained in 
Sec. III. At given temperature the two curves indeed coincide with each other in the limit 
m — > 1. Therefore we conclude that the approximate scheme based on Eq. (|5|) is valid for 
m close to unity. 



It is of interest to apply the above scheme to the dynamic model of neural networks fl2 
where a neuron is forced to have state Sj = — 1 during the refractory period. As a conse- 
quence, Gardner's method cannot be applicable even at zero temperature. In the dynamic 
model, equations describing the time evolution of relevant physical quantities in general 
assume the form of differential- difference equations due to the retardation in interactions. 
In particular, the activity (si(t)) of the ith neuron at time t and the overlap M^t) between 
the network state and the /ith pattern at time t are determined by the differential-difference 
equations 

j t ( Si (t)) = (|-a)-(| + a)(^(t)) + i(l-^(t)))tanh(-^E^^(^-l)) 



j 

d 1 N f 3 

jM,(t) = -(i+^M^+^gera-^^tanh^x:^^- 1 )) 

i=l V j 

respectively. Here a represents the ratio of the refractory period to the time duration of 
the action potential. In the stationary state, the overlap M M takes the form 

M = V - (U 

" 1 + 2aiV ^l + 2a + tanh(^E^)) 

j 

Since £f Ylj Jij^j tends to be positive for typical types of interactions, we may, in the 
extreme-dilution limit, make an approximation in Eq. ([18]) as 

1 N 4atanh(^£ef^ 



iV -i(l + 2a)2-tanh 2 (^££fJ^ 
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The optimal storage capacity of the dynamic model is now determined by Eqs. (16) except 
for that the tanhx function is replaced by 4a tanhx/[(l + 2a) 2 — tanh 2 x]. Unlike the 
Hopfield model which discretizes the time, the dynamic model takes into account the 
existence of relevant time scales, and consequently, displays the overlap = 1/(1 + a) in 
the case of perfect recalling. This is reflected in the equation corresponding to Eq. ( |16b|) . 
For comparison with the Hopfield model, therefore, we rescale the tolerance parameter m 
by m = (1 + a)m, and finally get 

r-v^erf -1 ^) 



aJm) = 11 Dtt 2 



at zero temperature. 



V. Discussion 

We have proposed a new method to study the optimal storage property of neural 
networks at finite temperatures and investigated the optimal storage capacity a c for an 
extremely-diluted network, as a function of temperature T and the tolerance parameter m. 
At zero temperature, it has been found that a c = 2 regardless of the tolerance parameter 
whereas at finite temperature a c vanishes in the perfect matching limit (m — > 1), in 
general increasing with the tolerance. The best performance for given storage ratio a and 
temperature has been also obtained. At low temperatures (T < 7\ ~ 0.566) the network 
exhibits a first-order transition from high-quality performance to low-quality performance 
as the number of stored random patterns is increased. High-quality performance seems 
to be naturally favored by extremely-diluted networks if the level of noise is not so high. 
We have also studied an approximate scheme, which yields qualitatively different results 
except near m — 1. The crude approximation (sj) « £f used in Eq. (|J) has been found 
not good in the whole range of m; for m close to unity, however, it can be regarded as an 
accurate approximation. 
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Instead of the proposed criterion for recalling, one may consider a slightly different 
criterion: The time average of ^fsj(t) for each site i should be greater than m. In the 
same spirit as in Eq. (|5|), one may consider the problem of calculating the typical fractional 
volume of the space of interactions satisfying Eq. (^) and 

£ftanh(^=£j^) >m (19) 

for each i. The problem is then equivalent to Gardner's problem with her parameter k 
given by k = Ttanh -1 m, which implies that the required basin of attraction grows larger 
with the level of synaptic noise and with the accuracy of recalling. At zero temperature 
this leads to the optimal capacity a c = 2 regardless of m, whereas at finite temperatures a c 
increases with the tolerance. Despite their resemblance, the behavior of the optimal storage 
capacity with the criterion ([19]) is qualitatively different from that of ([5]). Although the 
argument of tanh in both cases is a sum over many sites j, it should be strongly correlated 
with £f if the network ({ Jij}) is to function as associative memory. In general, the overlap 
on site i, given by the r.h.s. of Eq. (|19|), will vary from site to site according to some 
distribution with finite variance. The criterion ([|) in the large N limit requires the overlap 
averaged over that distribution be greater than m while that of ( |T9"D demands that the 
overlap be greater than m for each site. 

Finally, there are several points for further investigation. Since we have assumed the 
replica- and site-symmetric ansatz in the calculation of the typical fractional volume, its 
stability against replica-symmetry-breaking should be checked. It is also of interest to 
extend our results to the fully-connected networks and other types of networks. 
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Figure Captions 



Fig. 1 Typical behavior of the maximum storage capacity ao as a function of the stationary 
overlap M* at various temperatures: From top to bottom, T = 0, 0.3, 0.5, and 0.7, 
respectively. Detailed behavior at T = 0.3 and 0.5 is displayed in the inset. 

Fig. 2 Typical behavior of the optimal storage capacity a c as a function of the tolerance 
parameter m at various temperatures: From top to bottom, T = 0.3, 0.5, and 0.7, re- 
spectively. The dotted curve shows the boundary of the region in which a c is constant. 



Fig. 3 The optimal storage capacity a c as a function of the tolerance parameter m at T = 
0, 0.3, 0.5, 0.7, and 2.0 when the alternative approximate criterion given by Eq. (|5D is 
used. 

Fig. 4 Detailed behavior of the optimal storage capacity a c for the tolerance parameter m 
near unity. Solid lines and dotted lines are results of Eqs. (§) and (|5|), respectively. 
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